In this Python notebook, you'll create an interactive map of the United States that shows four levels of population density. You'll extract U.S. Census statistics on zip code areas, population counts, and median housing age. You'll join those statistics into a single DataFrame and calculate population density per square kilometer. Then you'll run some SQL-like queries on the DataFrame to classify the zip codes into the four categories of interest. Finally, you'll create an interactive map using Mapbox technology.
This notebook runs on Python 2.7 with 2.1.
You'll use the categories to describe population density that are based on an academic study of urban structure and density, as described in the June 2014 article, From Jurisdictional to Functional Analysis of Urban Cores & Suburbs.
This article groups population into four categories that are based on population density and age of the houses:
Import the pandas, numpy, and os libraries:
import pandas as pd, numpy as np, os
You'll use the U.S. Census data from the 2013 US Census American Community Survey (ACS), 5-year estimates.
You're using this particular version of the ACS for these reasons:
You'll get the data sets and combine them:
2.1 Get zip code areas
2.2 Get population and age by zip code
2.3 Get the housing age data
2.4 Join the data sets
2.5 Rename the columns
To get the zip code areas:
GEO_URL = "YOUR ACCESS CODE"
geo_df = pd.read_csv( GEO_URL, usecols=['GEOID_Data','ALAND'], dtype={"GEOID_Data": np.str, "ALAND": np.int} )
geo_df.columns = ['GEOID','ALAND']
geo_df = geo_df.set_index('GEOID')
geo_df.head()
Get a data access link for Population and age by zip code and paste it into the next cell.
POP_URL = "YOUR ACCESS CODE"
pop_df = pd.read_csv( POP_URL, usecols=['GEOID','B01001e1'], dtype={"GEOID": np.str} )
pop_df.columns = ['GEOID','POPULATION']
pop_df = pop_df.set_index('GEOID')
pop_df.head()
Get a data access link for Housing (2015) and paste it into the next cell.
HOUSE_URL = "YOUR ACCESS CODE"
housing_df = pd.read_csv( HOUSE_URL, usecols=['GEOID','B25035e1'], dtype={"GEOID": np.str} )
housing_df = housing_df.set_index('GEOID')
housing_df.sample(5)
Join the three data sets into a data set named urban_df:
urban_df = geo_df.join(pop_df)
urban_df = urban_df.join(housing_df)
urban_df.sample(5)
Give the columns more meaningful names:
urban_df.columns = ['AREAMSQ','Population','MEDYRBUILT']
urban_df.sample(5)
You'll find the population density and assign a category for each area.
Calculate the population density per square kilometer: persons per square km = persons / (area in square meters / 1,000,000)
urban_df['POPPERKMSQ'] = urban_df['Population'] / (urban_df['AREAMSQ']/1000000)
urban_df.sample(4)
Assign a category to each area based on the population density:
urban_df['CAT'] = 'EXURBAN'
urban_df['CAT'][(urban_df['POPPERKMSQ'] >= 2900)] = 'URBAN'
urban_df['CAT'][(urban_df['POPPERKMSQ'] < 2900) & (urban_df['POPPERKMSQ'] >= 100) & (urban_df['MEDYRBUILT'] < 1980) & (urban_df['MEDYRBUILT'] >= 1946)] = 'SUBURBANEARLY'
urban_df['CAT'][(urban_df['POPPERKMSQ'] < 2900) & (urban_df['POPPERKMSQ'] >= 100) & (urban_df['MEDYRBUILT'] >= 1980)] = 'SUBURBANLATE'
urban_df.describe()
Look at a few records to do a quick sanity check:
urban_df.sample(10)
You'll convert the data to JSON format and create a JavaScript variable to visualize the data in a browser.
Convert the data to JSON format:
json_data_from_python = urban_df.reset_index().to_json(orient="records")
Create a JavaScript variable called vizObj for your JSON data. The data object vizObj is a global variable in your window that you could pass to another JavaScript function call.
from IPython.display import Javascript
Javascript("""window.vizObj={};""".format(json_data_from_python))
%%javascript
require.config({
paths: {
mapboxgl: 'https://api.tiles.mapbox.com/mapbox-gl-js/v0.39.1/mapbox-gl',
bootstrap: 'https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min'
}
});
IPython.OutputArea.auto_scroll_threshold = 9999;
require(['mapboxgl', 'bootstrap'], function(mapboxgl, bootstrap){
mapboxgl.accessToken = 'pk.eyJ1IjoicmFqcnNpbmdoIiwiYSI6ImpzeDhXbk0ifQ.VeSXCxcobmgfLgJAnsK3nw';
var map = new mapboxgl.Map({
container: 'map', // HTML element id in which to draw the map
style: 'mapbox://styles/mapbox/light-v9', //mapbox map to use
center: [-71.09, 42.44], // starting center position
zoom: 9 // starting zoom
});
var densitytypes = ["URBAN", "SUBURBANEARLY", "SUBURBANLATE", "EXURBAN"];
var densitycolors = ["#d7301f", "#fc8d59", "#fdcc8a", "#fef0d9"];
// Join local JSON data with vector tile geometry
var maxValue = 71227;
var data = vizObj;
// Get the vector geometries to join
// US Census Data Source
// https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
var mapId = "rajrsingh.bjb1ffhz";
var layerName = "zipsimple0025-btzfjd";
var vtMatchProp = "GEOID_Data";
var dataMatchProp = "GEOID";
var dataStyleProp = "CAT";
map.on('load', function() {
// Add source for state polygons hosted on Mapbox
map.addSource("zips", {
type: "vector",
url: "mapbox://" + mapId
});
// First value is the default, used where there is no data
var stops = [["0", "#888888"]];
// Calculate color for each state based on the unemployment rate
data.forEach(function(row) {
if (densitytypes.indexOf(row[dataStyleProp]) >= 0 ) {
var color = densitycolors[densitytypes.indexOf(row[dataStyleProp])];
stops.push([row[dataMatchProp], color]);
}
});
// Add layer from the vector tile source with data-driven style
map.addLayer({
"id": "zips-join",
"type": "fill",
"source": "zips",
"source-layer": layerName,
"paint": {
"fill-color": {
"property": vtMatchProp,
"type": "categorical",
"stops": stops
},
"fill-antialias": true,
"fill-outline-color": "rgba(255, 255, 255, 1)"
}
}, 'waterway-label');
});
});
element.append('<div id="map" style="position:relative;top:0;bottom:0;width:100%;height:400px;"></div>');
The map is centered on the Boston area. You can zoom and pan the map to see any area of the United States. Note: the generated map is not available in preview mode.
Views from around the country each tell different stories about the composition of urban areas. Combine this map with your own data to discover deeper insights into customers or constituents.
Learn more:
Raj Singh is a Developer Advocate and Open Data Lead at IBM Cloud Data Services. He specializes in all things geospatial and hacks on analytics in R/IBM Db2 Warehouse on Cloud and Spark/iPython notebooks.